Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
نویسندگان
چکیده
In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.
منابع مشابه
3D Classification of Urban Features Based on Integration of Structural and Spectral Information from UAV Imagery
Three-dimensional classification of urban features is one of the important tools for urban management and the basis of many analyzes in photogrammetry and remote sensing. Therefore, it is applied in many applications such as planning, urban management and disaster management. In this study, dense point clouds extracted from dense image matching is applied for classification in urban areas. Appl...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملارائه راهکاری جهت تجمیع داده ها در سازمانها با استفاده از وب سرویس
Increasing the speed and reducing the use of resources in the data integration process has always been the goal of developers and researchers in the process of data integration. The purpose of this study is to provide a solution using metadata as well as web browsing to speed up the process, so as to improve resources such as memory. The proposed solution is implemented using the three-layer ar...
متن کاملبررسی واکنش موتورهای کاوش وب به پیشینههای فرادادهای مبتنی برروش ترکیبی دادههای خرد و روش دادههای پیوندی
The purpose of this research was to find out the reaction of Web Search Engines to Metadata records created based on the combined method of Rich Snippets and Linked Data. 200 metadata records in two groups (100 records as the control group with the normal structure and, 100 records created based on microdata and implemented in RDF/XML as experimental group) extracted from the information gatewa...
متن کاملPseudo-spectral Matrix and Normalized Grunwald Approximation for Numerical Solution of Time Fractional Fokker-Planck Equation
This paper presents a new numerical method to solve time fractional Fokker-Planck equation. The space dimension is discretized to the Gauss-Lobatto points, then we apply pseudo-spectral successive integration matrix for this dimension. This approach shows that with less number of points, we can approximate the solution with more accuracy. The numerical results of the examples are displayed.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 37 شماره
صفحات -
تاریخ انتشار 2009